The scientific method is one of mankind's greatest creation. It allowed us to make observations about the world around us and discern what is fact and what is fiction, making way to great advancements in humanity's quality of life. Natural philosophy is well known as the predecessor to the scientific method. In fact, what we know as science today was simply called natural philosophy in the past. There didn't used to be a distinction.
Science is considered to be pretty objective in method. We have frameworks put in place like double-blind experimentation to reduce the number of errors due to human error or bias. Most people will also agree that a well written scientific article in a journal will be more objective than an article in a magazine. Can we use methods of machine learning in natural language processing to see if natural philosophy is similarly objective compared to other schools of thought?
The History of Philosophy dataset will be used to make the analysis, focusing specifically on Aristotle's body of work, since he has written texts on natural philosophy and also other topics such as logic and politics with which we can make a comparison on objectivity. The subset of works will be put into a topic modelling algorithm to try and discern which sentences are part of his works on natural philosophy, and then put through a sentiment analysis algorithm measuring the subjectivity of the sentences.
#Setup
#Operations
import pandas as pd
import numpy as np
import sys
import os
#Visualization
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from pprint import pprint
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
#NLP Cleaning
import spacy
from spacy.lang.en import English
import en_core_web_md
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
#NLP Algorithms
import gensim
from gensim.models import CoherenceModel
import gensim.corpora as corpora
from textblob import TextBlob
/opt/anaconda3/lib/python3.9/site-packages/scipy/sparse/sparsetools.py:21: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated! scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used. _deprecated()
#To allow for calling local functions
path_def='/Users/safiraraharjo/Documents/GitHub/ads-spring2023-project1-safiraharjo/'
sys.path.insert(0, path_def+'lib')
Topic Modelling is the task of using unsupervised learning to extract the main topics (represented as a set of words) that occur in a collection of documents. In this case, the collection of documents are sentences in philosophical texts and we are identifying which texts are likely to fall into the category of natural philosophy based on their topics. Latent Dirichlet Allocation (LDA) will be used to conduct this analysis, which is a statistical model that explains a set of observations (sentences) through unobserved groups (topics), and each group explains why some parts of the data are similar using a set of words. Before the LDA can be run, the dataset first needs to be pre-processed.
#Read data
df = pd.read_csv(path_def+'data/philosophy_data.csv')
df.head()
| title | author | school | sentence_spacy | sentence_str | original_publication_date | corpus_edition_date | sentence_length | sentence_lowered | tokenized_txt | lemmatized_str | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Plato - Complete Works | Plato | plato | What's new, Socrates, to make you leave your ... | What's new, Socrates, to make you leave your ... | -350 | 1997 | 125 | what's new, socrates, to make you leave your ... | ['what', 'new', 'socrates', 'to', 'make', 'you... | what be new , Socrates , to make -PRON- lea... |
| 1 | Plato - Complete Works | Plato | plato | Surely you are not prosecuting anyone before t... | Surely you are not prosecuting anyone before t... | -350 | 1997 | 69 | surely you are not prosecuting anyone before t... | ['surely', 'you', 'are', 'not', 'prosecuting',... | surely -PRON- be not prosecute anyone before ... |
| 2 | Plato - Complete Works | Plato | plato | The Athenians do not call this a prosecution b... | The Athenians do not call this a prosecution b... | -350 | 1997 | 74 | the athenians do not call this a prosecution b... | ['the', 'athenians', 'do', 'not', 'call', 'thi... | the Athenians do not call this a prosecution ... |
| 3 | Plato - Complete Works | Plato | plato | What is this you say? | What is this you say? | -350 | 1997 | 21 | what is this you say? | ['what', 'is', 'this', 'you', 'say'] | what be this -PRON- say ? |
| 4 | Plato - Complete Works | Plato | plato | Someone must have indicted you, for you are no... | Someone must have indicted you, for you are no... | -350 | 1997 | 101 | someone must have indicted you, for you are no... | ['someone', 'must', 'have', 'indicted', 'you',... | someone must have indict -PRON- , for -PRON- ... |
Texts need to go through a series of processes before they can be input into the LDA model. Some of these processes include tokenization and lemmatization. Tokenization is used in natural language processing to split paragraphs and sentences into smaller units that can be more easily assigned meaning. Lemmatization is a text normalization technique that switches any kind of a word to its base root mode. In addition to that, certain parts of text such as pronouns are removed, as they do not affect output topics.
from nlp_cleaning import cleaner
nlp = en_core_web_md.load()
cleaner(nlp,df,df.sentence_str)
| title | author | school | sentence_spacy | sentence_str | original_publication_date | corpus_edition_date | sentence_length | sentence_lowered | tokenized_txt | lemmatized_str | cleaned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Plato - Complete Works | Plato | plato | What's new, Socrates, to make you leave your ... | What's new, Socrates, to make you leave your ... | -350 | 1997 | 125 | what's new, socrates, to make you leave your ... | ['what', 'new', 'socrates', 'to', 'make', 'you... | what be new , Socrates , to make -PRON- lea... | [new, socrates, leave, usual, haunt, lyceum, s... |
| 1 | Plato - Complete Works | Plato | plato | Surely you are not prosecuting anyone before t... | Surely you are not prosecuting anyone before t... | -350 | 1997 | 69 | surely you are not prosecuting anyone before t... | ['surely', 'you', 'are', 'not', 'prosecuting',... | surely -PRON- be not prosecute anyone before ... | [prosecute, king, archon] |
| 2 | Plato - Complete Works | Plato | plato | The Athenians do not call this a prosecution b... | The Athenians do not call this a prosecution b... | -350 | 1997 | 74 | the athenians do not call this a prosecution b... | ['the', 'athenians', 'do', 'not', 'call', 'thi... | the Athenians do not call this a prosecution ... | [athenians, prosecution, indictment, euthyphro] |
| 3 | Plato - Complete Works | Plato | plato | What is this you say? | What is this you say? | -350 | 1997 | 21 | what is this you say? | ['what', 'is', 'this', 'you', 'say'] | what be this -PRON- say ? | [] |
| 4 | Plato - Complete Works | Plato | plato | Someone must have indicted you, for you are no... | Someone must have indicted you, for you are no... | -350 | 1997 | 101 | someone must have indicted you, for you are no... | ['someone', 'must', 'have', 'indicted', 'you',... | someone must have indict -PRON- , for -PRON- ... | [indict, go, tell, indict] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 360803 | Women, Race, And Class | Davis | feminism | But the socialization of housework including m... | But the socialization of housework including m... | 1981 | 1981 | 142 | but the socialization of housework including m... | ['but', 'the', 'socialization', 'of', 'housewo... | but the socialization of housework include me... | [socialization, housework, include, meal, prep... |
| 360804 | Women, Race, And Class | Davis | feminism | The only significant steps toward endingdomest... | The only significant steps toward endingdomest... | 1981 | 1981 | 117 | the only significant steps toward endingdomest... | ['the', 'only', 'significant', 'steps', 'towar... | the only significant step toward endingdomest... | [significant, step, endingdomestic, slavery, f... |
| 360805 | Women, Race, And Class | Davis | feminism | Working women, therefore, have a special and v... | Working women, therefore, have a special and v... | 1981 | 1981 | 90 | working women, therefore, have a special and v... | ['working', 'women', 'therefore', 'have', 'spe... | working woman , therefore , have a special an... | [working, woman, special, vital, interest, str... |
| 360806 | Women, Race, And Class | Davis | feminism | Moreover, under capitalism, campaigns for jobs... | Moreover, under capitalism, campaigns for jobs... | 1981 | 1981 | 199 | moreover, under capitalism, campaigns for jobs... | ['moreover', 'under', 'capitalism', 'campaigns... | moreover , under capitalism , campaign for jo... | [capitalism, campaign, job, equal, basis, man,... |
| 360807 | Women, Race, And Class | Davis | feminism | This strategy calls into question the validity... | This strategy calls into question the validity... | 1981 | 1981 | 126 | this strategy calls into question the validity... | ['this', 'strategy', 'calls', 'into', 'questio... | this strategy call into question the validity... | [strategy, call, question, validity, monopoly,... |
360808 rows × 12 columns
The result of the processed texts of Aristotle can be visualized in a word cloud. Words such as 'the' and 'a' are no longer in the text, and words like 'things' and 'thing'only appear in one form.
aristotle = df[df.author == 'Aristotle']
aristotle['cleaned_str'] = [' '.join(map(str, l)) for l in aristotle['cleaned']]
long_string = ','.join(list(aristotle['cleaned_str'].values))
wordcloud = WordCloud(background_color="white"
, contour_width=0.1
, contour_color="black"
, max_font_size=100
, random_state=42
, colormap="Dark2")
wordcloud.generate(long_string)
wordcloud.to_file(path_def+'figs/aristotle_wc.png')
wordcloud.to_image()
The first step to the modelling is removing words that appear in most of the text or appear very infrequently. Words that appear in most of the texts will make it difficult to cluster the text into topics, and words that appear very infrequently are likely to be irrelevant.
The number of topics also needs to be determined. If there is previous knowledge of the texts, the number of topics can often be determined without further analysis (e.g. if the texts are composed of articles from 5 different magazines, it can be expected that there are 5 topics). In this case however, the number of topics will be determined using an analysis of coherence scores.
There are two major types of coherence scores which will be used in this analysis. One of the most popular coherence metrics is called CV which measures the distance between words. The other, UMass coherence, measure how often two words are seen together.
In this analysis, the above steps are condensed into the coherence_score_viz function.
from coherence_modelling import coherence_score_viz
coherence_score_viz(aristotle['cleaned'],'u_mass',1,11,corpora,gensim.models.LdaMulticore,plt,CoherenceModel,path_def)
coherence_score_viz(aristotle['cleaned'],'c_v',1,11,corpora,gensim.models.LdaMulticore,plt,CoherenceModel,path_def)
With CV scores, there is a dramatic increase in coherence score at 4 topics. With UMass scores, there is a dramatic decrease in coherence between 2 and 4 topics. Given that 2 topics seem to be too general, 4 seems right. However, a look at Aristotle's body of work spanning natural philosophy, theoretical philosophy (logic and metaphysics), as well as practical philosophy (politics and economics), 5 topics were chosen as there was not a large decrease in coherence scores.
#Number of topics
num_topics = 5
#Final LDA model after determining number of topics
id2word = corpora.Dictionary(aristotle['cleaned'])
id2word.filter_extremes(no_below=5, no_above=0.5, keep_n=1000)
corpus = [id2word.doc2bow(doc) for doc in aristotle['cleaned']]
lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
num_topics=num_topics,
random_state=999)
#Print the Keyword in the 10 topics
pprint(lda_model.print_topics())
[(0, '0.015*"come" + 0.015*"thing" + 0.013*"tragedy" + 0.013*"say" + ' '0.011*"great" + 0.011*"time" + 0.010*"point" + 0.010*"form" + 0.009*"state" ' '+ 0.008*"speech"'), (1, '0.018*"animal" + 0.013*"small" + 0.013*"large" + 0.012*"kind" + ' '0.011*"water" + 0.011*"contrary" + 0.010*"form" + 0.010*"good" + ' '0.010*"sense" + 0.010*"reason"'), (2, '0.018*"poet" + 0.018*"man" + 0.017*"animal" + 0.014*"belong" + 0.014*"act" ' '+ 0.014*"body" + 0.011*"change" + 0.011*"state" + 0.010*"female" + ' '0.009*"male"'), (3, '0.021*"thing" + 0.016*"think" + 0.013*"part" + 0.013*"way" + 0.011*"cause" ' '+ 0.011*"matter" + 0.011*"place" + 0.010*"take" + 0.010*"number" + ' '0.009*"time"'), (4, '0.046*"man" + 0.035*"thing" + 0.033*"good" + 0.013*"case" + ' '0.010*"excellence" + 0.010*"mean" + 0.009*"say" + 0.009*"bad" + 0.009*"law" ' '+ 0.008*"kind"')]
The topics above look like topics 1, 2, and 3 could fall into the subject of natural philosophy.
#Appending the highest probability topic to each sentence
all_topics = lda_model.get_document_topics(corpus, minimum_probability=0.0)
all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()
all_topics_df = pd.DataFrame(all_topics_numpy)
topic_id_list = all_topics_df.idxmax(axis=1)
aristotle['topics'] = topic_id_list
#Cleaning to create wordcloud
topic_wordcloud = aristotle.groupby('topics')['cleaned_str'].apply(list)
topic_wordcloud[:] = [' '.join(map(str, l)) for l in topic_wordcloud[:]]
topic_wordcloud_df=pd.DataFrame(topic_wordcloud)
topic_wordcloud_df['cleaned_str'] = topic_wordcloud_df['cleaned_str'].str.replace("thing,", "")
#Displaying wordcloud of all the topics
sns.set()
plt.rcParams['figure.figsize'] = [90, 90]
x, y = np.ogrid[:300, :300]
wordcloud = WordCloud(background_color="white", contour_width=0.1,
contour_color="black", max_font_size=100, random_state=42,
colormap="Dark2")
for i in range(5):
wordcloud.generate(text=topic_wordcloud_df['cleaned_str'][i])
plt.subplot(5, 2, i+1)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title(topic_wordcloud_df.index[i], fontdict={'fontsize': 60})
plt.savefig(path_def+'figs/topic_wc.png')
plt.show()
/opt/anaconda3/lib/python3.9/site-packages/seaborn/rcmod.py:400: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(mpl.__version__) >= "3.0": /opt/anaconda3/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
The word cloud shows words like air in topic 1, genus in topic 2, and animal in topic 3. It is still difficult to determine which ones fall under natural philosophy, as they seem to have many words in common.
#Visualizing intertopic distance and relevant terms of topic
pyLDAvis.enable_notebook()
p = gensimvis.prepare(lda_model, corpus, id2word)
pyLDAvis.save_html(p, path_def+'figs/pyLDAvis.html')
p
/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses from imp import reload /opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses from imp import reload /opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses from imp import reload /opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses from imp import reload
Finally, looking at the intertopic distance, topic 2 is very far away from all the other topics, and the relevant topical keywords very much suggest it is talking about something scientific. Words like animal, bird, water, and heat appear. Sentences which have the highest probability of being in topic 2 will therefore be the subject of the next part of the analysis.
Subjectivity analysis investigates attitudes, feelings, and expressed opinions in a text. As a basic task, it classifies a text as subjective (opinion) or objective (fact). The Textblob module used in this analysis will measure subjectivity on a scale of 0 to 1, 0 being the most objective and 1 being the most subjective.
from subjectivity_modelling import subjectivity
subjectivity(aristotle, 12, TextBlob)
| title | author | school | sentence_spacy | sentence_str | original_publication_date | corpus_edition_date | sentence_length | sentence_lowered | tokenized_txt | lemmatized_str | cleaned | cleaned_str | topics | subjectivity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 38366 | Aristotle - Complete Works | Aristotle | aristotle | When things have only a name in common and th... | When things have only a name in common and th... | -320 | 1991 | 139 | when things have only a name in common and th... | ['when', 'things', 'have', 'only', 'name', 'in... | when thing have only a name in common and t... | [thing, common, definition, correspond, differ... | thing common definition correspond different c... | 4.0 | 0.550000 |
| 38367 | Aristotle - Complete Works | Aristotle | aristotle | Thus, for example, both a man and a picture ar... | Thus, for example, both a man and a picture ar... | -320 | 1991 | 56 | thus, for example, both a man and a picture ar... | ['thus', 'for', 'example', 'both', 'man', 'and... | thus , for example , both a man and a picture... | [example, man, picture, animal] | example man picture animal | 4.0 | 0.000000 |
| 38368 | Aristotle - Complete Works | Aristotle | aristotle | These have only a name in common and the defin... | These have only a name in common and the defin... | -320 | 1991 | 207 | these have only a name in common and the defin... | ['these', 'have', 'only', 'name', 'in', 'commo... | these have only a name in common and the defi... | [common, definition, correspond, different, an... | common definition correspond different animal ... | 4.0 | 0.466667 |
| 38369 | Aristotle - Complete Works | Aristotle | aristotle | When things have the name in common and the de... | When things have the name in common and the de... | -320 | 1991 | 134 | when things have the name in common and the de... | ['when', 'things', 'have', 'the', 'name', 'in'... | when thing have the name in common and the de... | [thing, common, definition, correspond, call, ... | thing common definition correspond call synony... | 0.0 | 0.500000 |
| 38370 | Aristotle - Complete Works | Aristotle | aristotle | Thus, for example, both a man and an ox are an... | Thus, for example, both a man and an ox are an... | -320 | 1991 | 52 | thus, for example, both a man and an ox are an... | ['thus', 'for', 'example', 'both', 'man', 'and... | thus , for example , both a man and an ox be ... | [example, man, ox, animal] | example man ox animal | 4.0 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 87140 | Aristotle - Complete Works | Aristotle | aristotle | ; which is a great advantage, since the more c... | ; which is a great advantage, since the more c... | -320 | 1991 | 137 | ; which is a great advantage, since the more c... | ['which', 'is', 'great', 'advantage', 'since',... | ; which be a great advantage , since the more... | [great, advantage, concentrated, effect, pleas... | great advantage concentrated effect pleasurabl... | NaN | 0.589286 |
| 87141 | Aristotle - Complete Works | Aristotle | aristotle | consider the Oedipus of Sophocles, for instanc... | consider the Oedipus of Sophocles, for instanc... | -320 | 1991 | 118 | consider the oedipus of sophocles, for instanc... | ['consider', 'the', 'oedipus', 'of', 'sophocle... | consider the Oedipus of Sophocles , for insta... | [consider, oedipus, sophocles, instance, effec... | consider oedipus sophocles instance effect exp... | NaN | 0.000000 |
| 87142 | Aristotle - Complete Works | Aristotle | aristotle | There is less unity in the imitation of the ep... | There is less unity in the imitation of the ep... | -320 | 1991 | 308 | there is less unity in the imitation of the ep... | ['there', 'is', 'less', 'unity', 'in', 'the', ... | there be less unity in the imitation of the e... | [unity, imitation, epic, poet, prove, fact, wo... | unity imitation epic poet prove fact work thei... | NaN | 0.342857 |
| 87143 | Aristotle - Complete Works | Aristotle | aristotle | In saying that there is less unity in an epic,... | In saying that there is less unity in an epic,... | -320 | 1991 | 333 | in saying that there is less unity in an epic,... | ['in', 'saying', 'that', 'there', 'is', 'less'... | in say that there be less unity in an epic , ... | [say, unity, epic, mean, epic, plurality, acti... | say unity epic mean epic plurality action way ... | NaN | 0.473438 |
| 87144 | Aristotle - Complete Works | Aristotle | aristotle | If, then, tragedy is superior in these respect... | If, then, tragedy is superior in these respect... | -320 | 1991 | 324 | if, then, tragedy is superior in these respect... | ['if', 'then', 'tragedy', 'is', 'superior', 'i... | if , then , tragedy be superior in these resp... | [tragedy, superior, respect, poetic, effect, f... | tragedy superior respect poetic effect form po... | NaN | 0.649345 |
48779 rows × 15 columns
aristotle.groupby(['topics'])['subjectivity'].mean()
topics 0.0 0.315566 1.0 0.325995 2.0 0.327974 3.0 0.315465 4.0 0.335072 Name: subjectivity, dtype: float64
The average level of subjectivity in all topic groups don't differ very much. Topic 2, which seems to be closest to natural philosophy, is about average in terms of subjectivity compared to other topic groups.
df['cleaned_str'] = [' '.join(map(str, l)) for l in df['cleaned']]
subjectivity(df, 12, TextBlob)
| title | author | school | sentence_spacy | sentence_str | original_publication_date | corpus_edition_date | sentence_length | sentence_lowered | tokenized_txt | lemmatized_str | cleaned | cleaned_str | subjectivity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Plato - Complete Works | Plato | plato | What's new, Socrates, to make you leave your ... | What's new, Socrates, to make you leave your ... | -350 | 1997 | 125 | what's new, socrates, to make you leave your ... | ['what', 'new', 'socrates', 'to', 'make', 'you... | what be new , Socrates , to make -PRON- lea... | [new, socrates, leave, usual, haunt, lyceum, s... | new socrates leave usual haunt lyceum spend ti... | 0.352273 |
| 1 | Plato - Complete Works | Plato | plato | Surely you are not prosecuting anyone before t... | Surely you are not prosecuting anyone before t... | -350 | 1997 | 69 | surely you are not prosecuting anyone before t... | ['surely', 'you', 'are', 'not', 'prosecuting',... | surely -PRON- be not prosecute anyone before ... | [prosecute, king, archon] | prosecute king archon | 0.000000 |
| 2 | Plato - Complete Works | Plato | plato | The Athenians do not call this a prosecution b... | The Athenians do not call this a prosecution b... | -350 | 1997 | 74 | the athenians do not call this a prosecution b... | ['the', 'athenians', 'do', 'not', 'call', 'thi... | the Athenians do not call this a prosecution ... | [athenians, prosecution, indictment, euthyphro] | athenians prosecution indictment euthyphro | 0.000000 |
| 3 | Plato - Complete Works | Plato | plato | What is this you say? | What is this you say? | -350 | 1997 | 21 | what is this you say? | ['what', 'is', 'this', 'you', 'say'] | what be this -PRON- say ? | [] | 0.000000 | |
| 4 | Plato - Complete Works | Plato | plato | Someone must have indicted you, for you are no... | Someone must have indicted you, for you are no... | -350 | 1997 | 101 | someone must have indicted you, for you are no... | ['someone', 'must', 'have', 'indicted', 'you',... | someone must have indict -PRON- , for -PRON- ... | [indict, go, tell, indict] | indict go tell indict | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 360803 | Women, Race, And Class | Davis | feminism | But the socialization of housework including m... | But the socialization of housework including m... | 1981 | 1981 | 142 | but the socialization of housework including m... | ['but', 'the', 'socialization', 'of', 'housewo... | but the socialization of housework include me... | [socialization, housework, include, meal, prep... | socialization housework include meal preparati... | 0.000000 |
| 360804 | Women, Race, And Class | Davis | feminism | The only significant steps toward endingdomest... | The only significant steps toward endingdomest... | 1981 | 1981 | 117 | the only significant steps toward endingdomest... | ['the', 'only', 'significant', 'steps', 'towar... | the only significant step toward endingdomest... | [significant, step, endingdomestic, slavery, f... | significant step endingdomestic slavery fact t... | 0.875000 |
| 360805 | Women, Race, And Class | Davis | feminism | Working women, therefore, have a special and v... | Working women, therefore, have a special and v... | 1981 | 1981 | 90 | working women, therefore, have a special and v... | ['working', 'women', 'therefore', 'have', 'spe... | working woman , therefore , have a special an... | [working, woman, special, vital, interest, str... | working woman special vital interest struggle ... | 0.485714 |
| 360806 | Women, Race, And Class | Davis | feminism | Moreover, under capitalism, campaigns for jobs... | Moreover, under capitalism, campaigns for jobs... | 1981 | 1981 | 199 | moreover, under capitalism, campaigns for jobs... | ['moreover', 'under', 'capitalism', 'campaigns... | moreover , under capitalism , campaign for jo... | [capitalism, campaign, job, equal, basis, man,... | capitalism campaign job equal basis man combin... | 0.438889 |
| 360807 | Women, Race, And Class | Davis | feminism | This strategy calls into question the validity... | This strategy calls into question the validity... | 1981 | 1981 | 126 | this strategy calls into question the validity... | ['this', 'strategy', 'calls', 'into', 'questio... | this strategy call into question the validity... | [strategy, call, question, validity, monopoly,... | strategy call question validity monopoly capit... | 0.000000 |
360808 rows × 14 columns
df.groupby('school').agg({'subjectivity': ['mean']})
| subjectivity | |
|---|---|
| mean | |
| school | |
| analytic | 0.286684 |
| aristotle | 0.341157 |
| capitalism | 0.367580 |
| communism | 0.290665 |
| continental | 0.314276 |
| empiricism | 0.357706 |
| feminism | 0.361778 |
| german_idealism | 0.312720 |
| nietzsche | 0.333113 |
| phenomenology | 0.283388 |
| plato | 0.319806 |
| rationalism | 0.366719 |
| stoicism | 0.329450 |
Running the analysis on the overall dataset shows that the subjectivity of Aristotle's works is about average compared to other schools of thoughts. Some of the ones that are notably more objective are tha analytic and phenomenology schools of philosophy.
As a fun exercise, the same subjectivity analysis methods are applied on a dataset of 10,000 Abstracts of Covid Research Papers to see whether modern research papers are any more objective than philosophical texts.
abstracts = pd.read_csv(path_def+'data/covid_abstracts.csv')
abstracts.head()
| title | abstract | url | |
|---|---|---|---|
| 0 | Real-World Experience with COVID-19 Including... | This article summarizes the experiences of COV... | https://pubmed.ncbi.nlm.nih.gov/35008137 |
| 1 | Successful outcome of pre-engraftment COVID-19... | Coronavirus disease 2019 COVID-19 caused by... | https://pubmed.ncbi.nlm.nih.gov/35008104 |
| 2 | The impact of COVID-19 on oncology professiona... | BACKGROUND COVID-19 has had a significant imp... | https://pubmed.ncbi.nlm.nih.gov/35007996 |
| 3 | ICU admission and mortality classifiers for CO... | The coronavirus disease 2019 COVID-19 which ... | https://pubmed.ncbi.nlm.nih.gov/35007991 |
| 4 | Clinical evaluation of nasopharyngeal midturb... | In the setting of supply chain shortages of na... | https://pubmed.ncbi.nlm.nih.gov/35007959 |
cleaner(nlp,abstracts,abstracts.abstract)
abstracts['cleaned_str'] = [' '.join(map(str, l)) for l in abstracts['cleaned']]
subjectivity(abstracts, 4, TextBlob)
abstracts['subjectivity'].mean()
0.4210568621231637
Surprisingly, the abstracts of modern research papers are more subjective with a score of 0.42. Looking further into the dataset, a sentence like "The impact of the coronavirus disease 2019 (COVID-19) pandemic on well-being has the potential for serious negative consequences on work, home life, and patient care" found in one of the abstracts, is a very subjective sentence. While most people can agree the sentence is factual, serious consequences are subjective, as what one considers serious may not be considered serious by others. Conversely, a sentence by Aristotle like "Thus, for example, both a man and an ox are animals" is both factually correct but also doesn't use any subjective language. Oftentimes, modern research is geared towards getting people to take action, such as wear a mask, and using emotionally charged subjective language is affective in addition to providing factual evidence.
It can be concluded that just just because the methods of analysis used in the texts are more objective, it doesn't mean that the language used is, regardless of what the topic is.